Skip to content

fix(fuzz): address fuzzer failures for step-invariant and count_values#697

Open
sylr wants to merge 2 commits into
thanos-io:mainfrom
sylr:fix/fuzzer
Open

fix(fuzz): address fuzzer failures for step-invariant and count_values#697
sylr wants to merge 2 commits into
thanos-io:mainfrom
sylr:fix/fuzzer

Conversation

@sylr
Copy link
Copy Markdown

@sylr sylr commented Mar 20, 2026

Summary

  • Skip sample validation for expressions using the @ modifier, since the Thanos step invariant operator caches results via sync.Once and replays without re-counting samples per step (telemetry difference, not correctness)
  • Exclude count_values from fuzzer validation, since it stringifies float values into labels, amplifying last-digit float64 precision differences into label mismatches
  • Add fuzzer corpus entry for FuzzNativeHistogramQuery

Test plan

  • FuzzNativeHistogramQuery/df60c8694f1c9759 passes
  • FuzzEnginePromQLSmithRangeQuery/59c2955b78c86bab passes
  • All other fuzzer corpus entries pass
  • Full go test ./... passes (no regressions)

🤖 Generated with Claude Code

sylr and others added 2 commits March 20, 2026 17:04
The native histogram fuzzer found a samples-per-step mismatch when
queries use the @ modifier (e.g. predict_linear({...}[2m] @ 0.000)).
The Thanos engine's step invariant operator caches the first evaluation
and replays it for subsequent steps without re-counting samples, while
Prometheus counts samples at every step. This is an optimization
difference, not a correctness issue.

Detect @ modifier usage via VectorSelector.Timestamp/StartOrEnd fields
and skip sample comparison for those expressions.

Constraint: step invariant operator uses sync.Once for caching, samples only counted on first batch
Rejected: Fix sample counting in step invariant operator | would add overhead for non-functional stat tracking
Confidence: high
Scope-risk: narrow

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Sylvain Rabot <sylvain@abstraction.fr>
count_values converts float values to string labels, amplifying tiny
floating point precision differences between engines into label
mismatches. For example, stdvar_over_time may compute 61.24999999999997
vs 61.24999999999998 — both correct within float64 precision, but
producing different count_values labels.

Constraint: float64 has ~15-16 significant digits; different evaluation order yields different last-digit results
Rejected: Fix float precision in stdvar_over_time | both values are equally valid within IEEE 754
Confidence: high
Scope-risk: narrow

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Sylvain Rabot <sylvain@abstraction.fr>
@sylr sylr marked this pull request as ready for review March 24, 2026 08:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant